首页> 外文OA文献 >Combining statistical machine translation and translation memories with domain adaptation
【2h】

Combining statistical machine translation and translation memories with domain adaptation

机译:将统计机器翻译和翻译记忆库与领域适配相结合

摘要

Since the emergence of translation memory software, translation companies and freelance translators have been accumulating translated text for various languages and domains. This data has the potential of being used for training domain-specific machine translation systems for corporate or even personal use. But while the resulting systems usually perform well in translating domain-specific language, their out-of-domain vocabulary coverage is often insufficient due to the limited size of the translation memories. In this paper, we demonstrate that small in-domain translation memories can be successfully complemented with freely available general-domain parallel corpora such that (a) the number of out-of-vocabulary words (OOV) is reduced while (b) the in-domain terminology is preserved. In our experiments, a German–French and a German–Italian statistical machine translation system geared to marketing texts of the automobile industry has been significantly improved using Europarl and OpenSubtitles data, both in terms of automatic evaluation metrics and human judgement.
机译:自翻译记忆软件出现以来,翻译公司和自由翻译人员一直在积累各种语言和领域的翻译文本。此数据有潜力用于培训公司或个人使用的特定于域的机器翻译系统。但是,尽管生成的系统通常在翻译特定领域的语言方面表现良好,但由于翻译记忆库的大小有限,其域外词汇覆盖率通常不足。在本文中,我们证明了可以使用免费提供的通用域并行语料库来成功地补充小的域内翻译记忆库,从而使(a)词汇外单词(OOV)的数量减少,而(b)域术语被保留。在我们的实验中,使用Europarl和OpenSubtitles数据在自动评估指标和人工判断方面都显着改善了适用于汽车行业市场营销文本的德语-法语和德语-意大利语统计机器翻译系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号